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REMARKS 

Claims 1-17, all the claims pending in Ae application, stand rejected on prior art grounds. 
Applicants respectfully traverse these rejections based on the following discussion. 

L The Prior Art Rejections 

Claims 1, 6, and 1 1 stand rejected under 35 U.S.C. §I03(a) as being unpatentable over 
Kostofifet al., hereinafter "Kostoff" (U.S. Patent No". 5,440,481). Claims 2-5» 7-10, and 12-17 
stand rejected under 35 U.S.C. § 1 03(a) as being unpatentable over Kostoff and in Anther view of 
Kirsch et al., heremafter "B:irsch"(U.S. Patent No. 6,070,158), Kobayashi (U.S. Patent No. 
5,742,834) and Tumey (U.S. Patent No. 6,470,307). Applicants respectfully traverse these 
rejections based on the following discussion. 

A. The Rejection Based on Kostoff 

In response to previous arguments, the Office Action accurately states (on pages 14-15) 
that the claimed invention limits the dictionary to the most fiequently occmxing terms, as limited 
by the preset maximum dictionary size. Then, the claimed invention can search the associated 
document for phrases that contain only these terms and produce a dictionary of most frequently 
occurring phases and terms. By using the maximum dictionary size as the vehicle to control how 
many terms are to be used in the phrase search, the invention provides an automated 
methodology which, without additional user input, reduces the size of the data that must be 
processed. 

The Office Action argues on pages 14-15, that because KostoJfF removes a manually 
created trivial phrase list from the dictionaxy before using the dictionary to search for phrases in 
the associated documents, one ordinarily skiUed in the art would be motivated to take efforts to 
reduce the dictionary size before searching for phrases, as in the claimed invention. 
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In other words, the Office Action presents an argument that, by limiting the dictionaiy to 
only the most frequently occuixing words (as limited by the maximum dictionaiy size), the 
claimed invention essentially removes all "trivial" words from the dictionaiy before searching for 
phrases. Since Kostoff teaches that all trivial words should be removed from the dictionaiy 
before searching for phrases the Office Action argues that Kostofif would have suggested the 
claimed invention to one ordinarily skilled in the art. 

While the argument in the Office Action is quite creative, it is Applicants' position that 
the claimed methodology is fully automated (the only input required being the maximum 
dictionaiy size, which can simply be equal to the available memory or manually preset by flie 
user), while Kostofif requires the user to manually create the trivial phrase list (col. 4, lines 39- 
42). The efBciency gains of the automated inventive methodology when compared to the manual 
system described in Kostoff are substantial. 

Further, the removal of trivial words is the same as the removal of a manually created list 
of "atop" words as defined by dependent claims 2-3, 7-8, and 12-13. The rules of claim 
differentiation and construction provide that if a fu^t feature is defined in one portion of a claim, 
that each other feature is distinguishable from that first feature. Here, the removal of a manuaUy 
created list of trivial phrases in Kostoff is equivalent to the claimed removal of a manually 
created list of stop words. Thus, the claimed method of limiting the dictionary according to a 
maximum size is a distmct feature from the removal of trivial or stop words and phrases. 
Therefore, it is Applicants that the discussion in BLostoff regarding the list of trivial woids and 
phrases teaches no more that what is perfonned when the claimed invention removes stop words. 
There is nothing within Kostofif which would suggest that this removal of trivial or stop words 
would lead one ordinarily skilled in the art to limit the words in the dictionary acconiing to a 
maximum dictionaiy size. 

The creation of a manual Ust of trivial words ("to", "if', etc.) and its removal from the 
dictionaiy does not suggest the claimed automated methodology which simply and automatically 
limits the dictionaiy using a size limit. It is AppUcants' position that the requirement that a 
manually created list be used to limit the dictionaiy size teaches away from the claimed 
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automated methodology which does not require the user to specify any words, but instead merely 
eliminates the least frequent words from the dictionary. Further, the claimed invention may 
actually include all "trivial" words (if these stop words are not otherwise removed as provided in 
the dependent claims) as these words may be the most common. Again, the claimed invention 
removes the "most frequently occuiring words in said documents as limited by said maximum 
dictionary size" and trivial or stop words may actually be the most common (if not removed). 

One difference between the claimed invention and Kostofif is that the size of the 
dictionary is limited before the frequency of phrases in the docximent that contain words in the 
dictionary is determined. This is important because the number of phrases grows exponentially 
with the size of the corpus. Simply removing a list of trivial phrases may not reduce the 
dictionary size (especially if the manually created list of trivial phrases finds no matches in the 
dictionary). By reducing the size of the dictionary before determining the firequency of phrases 
containing words in the dictionary, the claimed invention produces exponential gains in 
processing speed and memory usage. 

In other words, the claimed invention involves more than just reducing the dictionary to 
meet a memory constraint. In the claimed invention, the dictionary is reduced in order to 
substantially simplify the subsequent process of determining the frequency of phrases in the 
document containing words in the dictionary. 

The claimed invention first limits the dictionary to only the top number of most 
frequently occurring words and then only considers phrases that contain these words. The 
invention avoids maintaining a list of all potential phrases in the text corpus. The problem with 
maintaining all potential phrases is that the number of phrases grows exponentially with the size 
of the corpus. The invention avoids this problem by fixing the size of the dictionary up front 
(user specified maximimx dictionary size, M), then finding the M most frequent words and then 
only creatii^ phrases using these M most frequent words. To the contrary, the Kostofifpatent 
creates a list of potentially all words and N-word phrases sorted by frequency. This is not 
practical for a large text corpus since such a list woxild be too large for most computer memory to 
hold. 
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The Office Action admits that Kostoff does not explicitly teach the claimed process of 
limiting the number of words that are used to establish the most fiequently occurring phrases by 
limiting the dictionary size, but the Office Action argues that such a feature would have been 
obvious. More specifically, the Office Action notes that Kostoff describes that the size of the list 
of trivial phrases is limited by memoiy constraints (col. 4, lines 42-45) and that the number of 
phrases output to the user can be limited to those having high mer interest, such as the top 60 
most frequent phrases (col. 5, line 59-coI. 6, line 64). Then, the Office Action argues that this 
would motivate one to limit the dictionary size to accommodate for hardware memory 
constraints. 

Applicants respectfully disagree with this logical argument of obviousness for a number 
of reasons, including the fact that Kostoff requires that the dictionary must include all words in 
the documents (except for the trivial phrases mentioned above). More specifically, Figure 2 and 
col. 4, lines 52^55 states that the system and methodology in Kostoff "is required to use the entire 
full-text database to create lists of phrases." Therefore, Applicants submit that Kostoff directly 
teaches away from the claimed limitation that explicitly does not use all the words from the 
documents, and instead limits the dictionary to only the number of most fiequentiy occurring 
words that will fit into the limited size dictionary. When a reference teaches away from the 
claimed invention it actually demonstrates that the claimed invention is not obvious. 

Thus, in a fust respect, since Kostoff "is required to use the entire full-text database to 
create lists of phrases" it cannot teach or suggest "creating a dictionary of most frequently 
occurring words in said documents as limited by said maximum dictionary size, such that said 
dictionary contains less than all words in said documents" as defined by independent claims 1 , 6, 
and 11. This requirement in Kostoff teaches away from the claimed invention and, therefore, 
Kostoff cannot teach or suggest this feature. 

Further, the manner in which Kostoff would deal with memoiy and other limitations is 
conceptually different than the claimed invention. For example, in order to deal with memoiy 
constraints, Kostoff creates a list of trivial phrases that can be excluded from analysis (col 4, 
lines 39-49). This is essentially a fixed Ust in Kostoff that may or may not be effective in 
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limiting the memory usage. To the contrary, the claimed invention limits the size of the 
dictionaiy, thereby providing for a more consistent and precise control of memory usage. In 
addition, the processing in KostofF always uses all woids in the database (except trivial words) 
and merely limits the number of phrases that are output (col. 5, line 59-coL 6, line 64). Thus, 
since all words are used in the most frequent phrase processing of KostofT, no memory is 
conserved- To the contrary, the claimed invention first limits the dictionaiy to only the top 
number of most frequently occurring words and then only considers phrases that contain these 
words. 

As explained on page 4, lines 4-9 of the application, the invention allows the user to 
specify the size of the vector space model to be used in text clustering of a document corpus, as 
well as the maximum number of words that can occur in a phrase. The invention will fmd all of 
tie phrases, up to the user specified length, that occur with the greatest frequency. The 'total 
number of phrases returned will depend upon the user specified maximum dictionary size. 

One distinction of the invention when compared to Kostoff is that the invention avoids 
maintaining a list of all potential phrases hi the text corpus. The problem with maintaining all 
potential phrases is that the number of phrases grows exponentially with the size of the corpus. 
The invention avoids this problem by fixing the size of the dictionaiy up front (user specified 
maximum dictionaiy size, M), then finding the M most fi^quent words and then only creating 
phrases using these M most firequent words. To the contrary, the Kostoff patent creates a Ust of 
all words and N-word phrases sorted by fi^equency. This is not practical for a large text corpus 
since such a list would be too large for most computer memoiy to hold. 

Therefore, it is AppUt^ts' position that Kostoff does not teach or suggest "creating a 
dictionary of most fi^quently occurring words in said documents as limited by said maximum 
dictionary size, such that said dictionary contains less than all words in said documents 
wherein said dictionaty size limits the number of words and phrases maintained in said 
dictionaiy" as defined by independent claims 1 and H and similarly defined by independent 
claim 6, Previous methodologies that have suggested a lexical phrase generation technique have 
not described the space and time efficient implementation for discovering such phrases that the 
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invention utilizes. The invention's implementation is designed to quickly find a maximal 
frequency term dictionary of a given size using the smallest possible amount of memory. 

Therefore, because the prior art of record does not teach or suggest the claimed invention, 
Applicants respectfully submit thai independent claims I, 6, and 1 1 are patentable over the prior 
ait of record. In view the foregoing, the Examiner is respectfully requested to reconsider and 
withdraw this rejection. 

B. The Rejection Based on Kostoff in view of Kirsch 
and further in view of Kobayashi and Tumey 

With respect to dependent claims 2-5, 7-10, and 12-17, the Office Action makes reference 
to the prior art Kirsch, Kobayashi, and Tumey as teaching concepts such as removing 
punctuation, replacmg words with synonyms, removing stop words, removing duplicates words, 
clustering, etc. Therefore, the additional prior art references are not utilized to teach or suggest 
(and do not teach or suggest) the claimed features defined by independent claims 1, 6, and 11, 
Therefore, it is Applicant' position that the proposed combination of all references stUl does not 
teacli or suggest "creating a dictionary of most frequently occurring woixJs in said documents as 
limited by said maximum dictionary size, such that said dictionary contains less than all words in 
said documents . . . wherein said dictionary size limits the number of words and phrases 
maintained in said dictionary" as defined by independent claims 1 and 1 1 and similarly defined 
by independent claim 6. Therefore, it is Applicants position that none of the prior art of record 
teach or suggest the invention defined by independent claims 1 , 6, and 11 and that such 
independent claims are patentable over the prior art record. 

Further, dependent claims 2-5, 7-10, and 12-17 are similarly patentable, not only by virtue 
of their dependency from a patentable independent claim, but also by vimie of the additional 
features of tiie invention they define. Therefore, AppHcants submit that dependent claims 2-5, 7- 
10, and 12-17 are patentable over the prior art of record and respectfiilly request that the 
Examiner reconsider and withdraw this rejection. 
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U» Formal Matters and Conclusion 

In view of the foregoing, Applicants submit that claims 1-17, all the claims presently 
pending in the application, are patentably distinct from the prior art of record and are in condition 
for allowance. The Examiner is respectftdly requested to pass the above application to issue at 
the earliest possible time. 

Should the Examiner find the application to be other than in condition for allowance, the 
Examiner is requested to contact the undersigned at the local telephone number listed below to 
discuss any other changes deemed necessary. 

Please charge any deficiencies and credit any overpayments to Attorney's Deposit 
Account Number 09-0441 . 



McGinn & Gibb, PLLC 

2568-A Riva Road, Suite 304 

Annapolis, MD 21401 

301-261-8071 

Customer Number: 29154 



Respectfully submitted. 



Dated: 





Frederick W. Gibb, m 
Registration No. 37,629 
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